Exploring the factors affecting the spread of Covid-19 in the US

With over 8 million confirmed cases and more than 400,000 deaths, the SARS-CoV-2 has caused immeasurable damage to our society. Even though we have experienced pandemics in the past, and had early warnings about the disease, we were not able to avert this one. There are thousands of factors that have contributed to the current situation, and in this study, we try to analyze the effects of some factors like the availability of healthcare, economic status, and lockdown measures on the spread of disease in the US so that we can be a little more prepared in the future.

Statewise and County wise distribution of cases in the US

In April, we saw an exponential growth both in the number of cases and the resulting deaths. Individual states had different starting points for the start of the spread but they all followed a similar trend. This also holds true for most of the nations around the world. To understand this, we used the NY-Times Covid-19 Data dataset.

Number of Cases over time

In [19]:
 

Number of deaths over time

In [20]:
 

The first cases in the US were observed in Washington, California, and Arizona, before NY became the hotspot. The disease spread faster in densely populated cities like New York and Los Angeles and put a lot of strain on the local healthcare systems. Although other cities have caught up with them as far as the number of cases is concerned, they were able to better handle it as it happened over a longer duration.

Number of cases(in log) across US states between March and June

In [7]:
 

Number of deaths across US between March and June

In [8]:
 

Analysis of correlation between Poverty, Healthcare availability and the spread of disease

To understand the spread of the COVID-19 virus in the US counties, we decided to explore the factors like poverty index, primary health care availability, unemployment percentage, etc., We gathered the county-level data for Poverty index and healthcare from USDA Economic Research Service . Upon calculating the feature correlation, it was noted that the poverty percentage had a negative correlation with the number of cases and the number of primary care physicians had a positive and high correlation with the number of cases. One potential explanation is that a higher population would mean more primary care physicians and with the hypothesis that the virus spreads faster in the densely populated areas, the positive correlation between the healthcare facilities and the number of cases is justified.

Feature Correlation

In [15]:
county_level.iloc[:,:].corr().style.background_gradient(cmap='Reds').format("{:.3f}")
Out[15]:
cfips cases deaths population mortality deaths_per_mil cases_per_mil percent_smokers percent_adults_with_obesity percent_physically_inactive percent_excessive_drinking percent_uninsured num_primary_care_physicians num_mental_health_providers high_school_graduation_rate percent_some_college percent_unemployed percent_children_in_poverty life_expectancy percent_poverty primary_care_per_pop
cfips 1.000 -0.027 -0.016 -0.056 -0.009 -0.103 -0.083 -0.086 -0.031 -0.111 0.075 0.153 -0.055 -0.065 0.151 0.005 -0.107 -0.094 0.035 -0.087 -0.010
cases -0.027 1.000 0.913 0.748 0.082 0.432 0.288 -0.142 -0.172 -0.124 0.078 -0.076 0.755 0.704 -0.098 0.129 -0.012 -0.062 0.153 -0.059 0.148
deaths -0.016 0.913 1.000 0.568 0.111 0.483 0.246 -0.112 -0.150 -0.086 0.062 -0.081 0.577 0.545 -0.090 0.109 -0.004 -0.041 0.128 -0.039 0.122
population -0.056 0.748 0.568 1.000 0.068 0.200 0.112 -0.188 -0.220 -0.224 0.126 -0.076 0.964 0.903 -0.116 0.197 -0.037 -0.113 0.185 -0.104 0.199
mortality -0.009 0.082 0.111 0.068 1.000 0.423 0.066 0.013 -0.016 -0.002 0.025 -0.086 0.071 0.069 -0.031 0.046 0.056 0.012 -0.014 -0.009 0.055
deaths_per_mil -0.103 0.432 0.483 0.200 0.423 1.000 0.553 0.029 0.005 0.021 -0.068 -0.053 0.219 0.208 -0.117 -0.025 0.100 0.121 -0.022 0.113 0.063
cases_per_mil -0.083 0.288 0.246 0.112 0.066 0.553 1.000 0.058 0.049 0.045 -0.076 0.036 0.131 0.121 -0.108 -0.118 0.030 0.113 -0.009 0.134 0.005
percent_smokers -0.086 -0.142 -0.112 -0.188 0.013 0.029 0.058 1.000 0.492 0.534 -0.421 0.093 -0.214 -0.206 -0.096 -0.535 0.439 0.633 -0.712 0.662 -0.222
percent_adults_with_obesity -0.031 -0.172 -0.150 -0.220 -0.016 0.005 0.049 0.492 1.000 0.560 -0.301 0.060 -0.265 -0.260 -0.023 -0.372 0.240 0.380 -0.486 0.368 -0.263
percent_physically_inactive -0.111 -0.124 -0.086 -0.224 -0.002 0.021 0.045 0.534 0.560 1.000 -0.481 0.232 -0.260 -0.254 0.046 -0.491 0.220 0.509 -0.571 0.455 -0.336
percent_excessive_drinking 0.075 0.078 0.062 0.126 0.025 -0.068 -0.076 -0.421 -0.301 -0.481 1.000 -0.348 0.147 0.147 0.041 0.498 -0.297 -0.610 0.525 -0.525 0.220
percent_uninsured 0.153 -0.076 -0.081 -0.076 -0.086 -0.053 0.036 0.093 0.060 0.232 -0.348 1.000 -0.101 -0.116 -0.032 -0.441 0.026 0.395 -0.202 0.341 -0.239
num_primary_care_physicians -0.055 0.755 0.577 0.964 0.071 0.219 0.131 -0.214 -0.265 -0.260 0.147 -0.101 1.000 0.936 -0.117 0.245 -0.061 -0.131 0.218 -0.116 0.273
num_mental_health_providers -0.065 0.704 0.545 0.903 0.069 0.208 0.121 -0.206 -0.260 -0.254 0.147 -0.116 0.936 1.000 -0.134 0.222 -0.048 -0.112 0.193 -0.094 0.238
high_school_graduation_rate 0.151 -0.098 -0.090 -0.116 -0.031 -0.117 -0.108 -0.096 -0.023 0.046 0.041 -0.032 -0.117 -0.134 1.000 0.055 -0.233 -0.207 0.066 -0.214 -0.081
percent_some_college 0.005 0.129 0.109 0.197 0.046 -0.025 -0.118 -0.535 -0.372 -0.491 0.498 -0.441 0.245 0.222 0.055 1.000 -0.384 -0.643 0.535 -0.587 0.412
percent_unemployed -0.107 -0.012 -0.004 -0.037 0.056 0.100 0.030 0.439 0.240 0.220 -0.297 0.026 -0.061 -0.048 -0.233 -0.384 1.000 0.535 -0.382 0.525 -0.119
percent_children_in_poverty -0.094 -0.062 -0.041 -0.113 0.012 0.121 0.113 0.633 0.380 0.509 -0.610 0.395 -0.131 -0.112 -0.207 -0.643 0.535 1.000 -0.653 0.931 -0.250
life_expectancy 0.035 0.153 0.128 0.185 -0.014 -0.022 -0.009 -0.712 -0.486 -0.571 0.525 -0.202 0.218 0.193 0.066 0.535 -0.382 -0.653 1.000 -0.622 0.243
percent_poverty -0.087 -0.059 -0.039 -0.104 -0.009 0.113 0.134 0.662 0.368 0.455 -0.525 0.341 -0.116 -0.094 -0.214 -0.587 0.525 0.931 -0.622 1.000 -0.211
primary_care_per_pop -0.010 0.148 0.122 0.199 0.055 0.063 0.005 -0.222 -0.263 -0.336 0.220 -0.239 0.273 0.238 -0.081 0.412 -0.119 -0.250 0.243 -0.211 1.000

Plots showing most affected counties in the US

Counties with most cases

In [11]:
 

Counties with most cases per million

In [12]:
 

Counties with most deaths

In [13]:
 

Counties with most deaths per million

In [14]:
 

Correlation between Mobility of people and spread of disease

One of the most effective measures to slow the spread of disease is social distancing. States implemented different forms of social distancing by locking down schools, businesses, and public places. Most states took lockdown measures between mid to end of March with varying success in curbing the spread. Factors like the specifics of lockdown policies, the degree to which they are enforced, and the stage at which it was implemented contribute to the effectiveness of lockdowns. One of the attributes to measure it is the average mobility of people compared to the pre-Covid-19 period. We used the Descartes Labs Mobility Data dataset to plot mobility and the number of cases over time from March till June. All states show decreased mobility during the lockdown period with states like NY and New Jersey showing less than 10% mobility during April. We observe that in the following weeks, the curve for new cases does start to flatten and we believe that lockdown measures were one of the reasons for that.

Plotting a Running Map for average mobility of people as compared to pre-Covid-19 period

In [12]:
 
In [16]:
from IPython.display import Video

Video("../Data/videos/vid.mp4")
Out[16]:

We also used the Stanford Social Distancing dataset which has data for county-level implementation of lockdown with attributes like what kind of institutions were closed. Post lockdown plots do not show the same exponential growth as was observed in pre-lockdown plots indicating the potential effectiveness of social distancing measures.

In [23]:
 
In [24]:
 
In [26]:
 
In [27]:
 

Conclusion

Although there are various factors that contribute to spread of communicable diseases, we found in our study that population density has the potential for accelerating the spread of disease and social distancing is one of the best measures to curb the spread.